Automatic Text Formatting for Social Media based on Linefeed and Comma Insertion
نویسندگان
چکیده
By appearance of social media, people are coming to be able to transmit information easily on a personal level. However, because users of social media generally spend little time on describing information, low-quality texts are transmitted and it blocks the spread of information. On transmitted texts in social media, commas and linefeeds are inserted incorrectly, and it becomes a factor of low-quality texts. This paper proposes a method for automatically formatting Japanese texts in social media. Our method formats texts by inserting commas and linefeeds appropriately. In our method, the positions where commas and linefeeds should be inserted are decided based on machine learning using morphological information, dependency relation and clause boundary information. An experiment using Japanese spoken language texts has shown the effectiveness of our method.
منابع مشابه
Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملAutomatic Comma Insertion for Japanese Text Generation
This paper proposes a method for automatically inserting commas into Japanese texts. In Japanese sentences, commas play an important role in explicitly separating the constituents, such as words and phrases, of a sentence. The method can be used as an elemental technology for natural language generation such as speech recognition and machine translation, or in writing-support tools for non-nati...
متن کاملAutomatic Comma Insertion of Lecture Transcripts Based on Multiple Annotations
To enhance readability and usability of speech recognition results, automatic punctuation is an essential process. In this paper, we address automatic comma prediction based on conditional random fields (CRF) using lexical, syntactic and pause information. Since there is large disagreement in comma insertion between humans, we model individual tendencies of punctuation using annotations given b...
متن کاملLinefeed Insertion into Japanese Spoken Monologue for Captioning
To support the real-time understanding of spoken monologue such as lectures and commentaries, the development of a captioning system is required. In monologues, since a sentence tends to be long, each sentence is often displayed in multi lines on one screen, it is necessary to insert linefeeds into a text so that the text becomes easy to read. This paper proposes a technique for inserting linef...
متن کاملAutomatic Linefeed Insertion for Improving Readability of Lecture Transcript
The development of a captioning system that supports the real-time understanding of monologue speech such as lectures and commentaries is required. In monologues, since a sentence tends to be long, each sentence is often displayed in multi lines on the screen and becomes unreadable. In the case, it is necessary to insert linefeeds into a text so that the text becomes easy to read. This paper pr...
متن کامل